Skip to content

Conversation

vblagoje
Copy link
Member

@vblagoje vblagoje commented Sep 9, 2025

Why

This PR migrates the Langfuse integration from Langfuse v2 to v3. Langfuse v3 is built on OpenTelemetry (OTel), which differs from v2's trace creation model. The migration introduces an expected extra root span level due to OTel's requirement that trace attributes must be held by a span, unlike v2's direct trace creation capability.
TLDR - you'll see an extra nested root span.

Proposed Changes

  • Complete migration from Langfuse v2 to v3 API
  • User's will not notice any difference in traces (other than the extra root)

How did you test it?

  • Updated existing unit tests to work with v3 API
  • Adjusted some tests to work with extra root span
  • Manually tested in complex multi-agent itinerary setup. Here is the trace. Note the double top nested root span. Here is how the trace looks like in v2 . Hassieb from Lanfguse told me: "exactly, it is expected that there is one root span under the trace, as OTel does not have a concept of a trace that holds attributes (whereas Langfuse does). So the behavior you are seeing is expected"

Notes for the reviewer

The extra root span level is expected behavior in v3 due to OTel's architecture - it's not a bug but a fundamental difference from v2. Some tests needed to be updated to account for this.

@github-actions github-actions bot added integration:langfuse type:documentation Improvements or additions to documentation labels Sep 9, 2025
@vblagoje vblagoje marked this pull request as ready for review September 10, 2025 09:35
@vblagoje vblagoje requested a review from a team as a code owner September 10, 2025 09:35
@vblagoje vblagoje requested review from julian-risch and removed request for a team September 10, 2025 09:35
@vblagoje
Copy link
Member Author

vblagoje commented Sep 10, 2025

@LastRemote would you please, at your convenience, try out this PR on your async setup we used to previously test #2207 It would be amazing to nail this transition to v3 without regressions and additional cleanups. 🙏

@sjrl
Copy link
Contributor

sjrl commented Sep 11, 2025

@vblagoje just so you are aware I also opened a PR here #2257 that enabled linting on the examples and tests folder. Given that you reworked the langfuse integration in this PR feel free to close mine if it no longer makes sense.

@vblagoje
Copy link
Member Author

@LastRemote did you have a chance to run this branch on your async setup? 🙏

@LastRemote
Copy link
Contributor

@vblagoje Not yet. We had some trouble self-hosting Langfuse v3 so I haven’t started. I am planning to do this at some point this week after my experiment with critic agents.

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR migrates the Langfuse integration from v2 to v3, which is built on OpenTelemetry (OTel). The migration introduces an expected extra root span level due to OTel's requirement that trace attributes must be held by a span, unlike v2's direct trace creation capability.

Key changes:

  • Updated Langfuse dependency from >=2.9.0, <3.0.0 to >=3.0.0, <4.0.0
  • Refactored span creation to use OTel-based context managers instead of direct trace/span creation
  • Updated tests to accommodate the new API and extra root span behavior

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
pyproject.toml Updates Langfuse dependency to v3
tracer.py Core migration from v2 to v3 API with context manager-based span creation
test_tracer.py Updates unit tests with new mock structure for v3 API
test_tracing.py Updates integration tests to handle extra root span and configurable host
Comments suppressed due to low confidence (1)

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@@ -456,4 +479,4 @@ def get_trace_id(self) -> str:
Return the trace ID.
:return: The trace ID.
"""
return self._tracer.get_trace_id()
return self._tracer.get_current_observation_id()
Copy link
Preview

Copilot AI Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method get_trace_id() should return a trace ID, but it's calling get_current_observation_id() which returns an observation ID. This appears to be incorrect - it should likely call get_current_trace_id() instead.

Suggested change
return self._tracer.get_current_observation_id()
return self._tracer.get_current_trace_id()

Copilot uses AI. Check for mistakes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that there is a name mismatch here that we at least should explain or fix if the intention is to call get_current_trace_id

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll rename it - both methods have exactly the same semantics!

@@ -119,7 +122,6 @@ def test_tracing_integration(llm_class, env_var, expected_trace, basic_pipeline)
)
@pytest.mark.integration
def test_tracing_with_sub_pipelines():

@component
Copy link
Preview

Copilot AI Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the extra empty line before the @component decorator to maintain consistent spacing.

Copilot uses AI. Check for mistakes.

Copy link
Member

@julian-risch julian-risch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks quite good to me already! Just some redundant code blocks in tests regarding mocking. Biggest question is about calling get_current_observation_id in get_trace_id.

Comment on lines 334 to 336
mock_client.start_as_current_span.return_value = MockContextManager()
mock_client.start_as_current_observation.return_value = MockContextManager()
mock_client.get_current_trace_id.return_value = "mock_trace_id_123"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These three lines seem redundant given that in the line above we call mock_get_client, which sets

    mock_client.start_as_current_span = Mock(return_value=MockContextManager())
    mock_client.start_as_current_observation = Mock(return_value=MockContextManager())
    mock_client.get_current_trace_id = Mock(return_value="mock_trace_id_123")

It occurs a couple of times in the file, for example test_trace_generation and test_trace_generation_invalid_start_time. Just check all the places where we call mock_get_client()

if trace_attrs:
# We need to get the actual span from the context manager
# For now, we'll skip this as the context manager needs to be entered
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code block doesn't do anything. Could be removed? What does "For now" mean? :)

@@ -456,4 +479,4 @@ def get_trace_id(self) -> str:
Return the trace ID.
:return: The trace ID.
"""
return self._tracer.get_trace_id()
return self._tracer.get_current_observation_id()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that there is a name mismatch here that we at least should explain or fix if the intention is to call get_current_trace_id

@julian-risch
Copy link
Member

My comments from our earlier conversation

Content pages to check if there are any updates needed:

Do you think we could leave a note somewhere in the docs about how to use the API SDK version 2 with langfuse-haystack<=2.3.0 or langfuse-haystack<3.0.0?

Any new features that users benefit from that we can highlight? Improved simplicity?
Key changes that I read about don't explain well enough how the user benefits from the upgrade. If there is no clear benefit, that's also okay but maybe you know of any?

  • Context Management: OpenTelemetry now handles context propagation automatically, reducing the need to manually pass IDs.
  • The name parameter is now required when creating spans and generations.
  • langfuse.create_trace_id()
  • LlamaIndex: There is no longer a Langfuse-specific LlamaIndex integration; use third-party OTEL-based instrumentations. However, we are convinced that a Langfuse-specific integration is the better approach

Shall we ask to get listed here? https://langfuse.com/changelog/2025-06-05-python-sdk-v3-generally-available

@vblagoje
Copy link
Member Author

Do you think we could leave a note somewhere in the docs about how to use the API SDK version 2 with langfuse-haystack<=2.3.0 or langfuse-haystack<3.0.0?

I really don't think we are adding some amazing value by moving from 2.x to 3.x

Any new features that users benefit from that we can highlight? Improved simplicity? Key changes that I read about don't explain well enough how the user benefits from the upgrade. If there is no clear benefit, that's also okay but maybe you know of any?

I don't think there is some great benefit except that users already use some other feature of Langfuse i.e. prompt management, they don't have to bundle two versions of the same library in their deployments.

Shall we ask to get listed here? https://langfuse.com/changelog/2025-06-05-python-sdk-v3-generally-available

We can, I can ask Hassieb

@vblagoje
Copy link
Member Author

Have another look @julian-risch 🙏

Copy link
Member

@julian-risch julian-risch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 👍 Nothing else that I can spot in a code review but testing it out a bit more wouldn't hurt for sure.

@vblagoje
Copy link
Member Author

LGTM! 👍 Nothing else that I can spot in a code review but testing it out a bit more wouldn't hurt for sure.

Yes, let's hear from @LastRemote - he's using langfuse integration in all sorts of edge cases. If he says this PR doesn't break any of them I'd be very confident about going forward with this PR.

@LastRemote
Copy link
Contributor

LastRemote commented Sep 18, 2025

he's using langfuse integration in all sorts of edge cases.

@vblagoje lol true. I did lots of customizations around Agent and ChatMessage, and in particular I extended the base class LangfuseSpan to support fancy multimodal and reasoning of customized ChatMessage.

That being said, the new context_manager approach of LangfuseSpan is a little harder to play around since I cannot do something like CustomizedLangfuseSpan(langfuse_span.raw_span()) anymore, but this is not a big problem.

I switched to the public hosted Langfuse to test this out. I am not seeing session-id/user-id in Langfuse traces, though I found this information in the logs. Here are the full trace details:

2025-09-18 15:58:58.015 | DEBUG    | langfuse._client.span_processor:on_end:122 - Trace: Processing span name='main' | Full details:
{
  "name": "main",
  "context": {
    "trace_id": "3f075c6fd6493c693434bcefb4e4039b",
    "span_id": "329faba9d9115b47",
    "trace_state": "[]"
  },
  "kind": "SpanKind.INTERNAL",
  "parent_id": null,
  "start_time": "2025-09-18T07:58:49.775238Z",
  "end_time": "2025-09-18T07:58:58.014925Z",
  "status": {
    "status_code": "UNSET"
  },
  "attributes": {
    "langfuse.trace.name": "main",
    "user.id": "test",
    "session.id": "foobar",
    "langfuse.version": 45292,
    "langfuse.trace.tags": [
      "bonjour",
      "nihao",
      "hola"
    ],
    "langfuse.trace.public": false,
    "langfuse.observation.metadata.haystack.pipeline.input_data": "{\"56c3cb2b-1843-4a09-9517-ce69154181e0\": {\"query\": \"Hello\", \"_history\": []}}",
    "langfuse.observation.metadata.haystack.pipeline.output_data": "{}",
    "langfuse.observation.metadata.haystack.pipeline.metadata": "{}",
    "langfuse.observation.metadata.haystack.pipeline.max_runs_per_component": 100,
    "langfuse.observation.type": "span",
    "langfuse.observation.input": "{\"56c3cb2b-1843-4a09-9517-ce69154181e0\": {\"query\": \"Hello\", \"_history\": []}}",
    "langfuse.observation.output": "<output>"
  },
  "events": [],
  "links": [],
  "resource": {
    "attributes": {
      "telemetry.sdk.language": "python",
      "telemetry.sdk.name": "opentelemetry",
      "telemetry.sdk.version": "1.37.0",
      "langfuse.release": "sit",
      "service.name": "unknown_service"
    },
    "schema_url": ""
  },
  "instrumentationScope": {
    "name": "langfuse-sdk",
    "version": "3.4.0",
    "schema_url": "",
    "attributes": {
      "public_key": "<redacted>"
    }
  }
}

Screenshot 2025-09-18 at 16 04 29

@vblagoje
Copy link
Member Author

@LastRemote what about those async traces - do they still work? I don't think we have "official" support for session ids, it's still in this DIY phase as suggested by @sjrl in this comment?

@LastRemote
Copy link
Contributor

LastRemote commented Sep 19, 2025

@vblagoje Asyncs are fine. But I thought session-id/user-id were working? There was code supporting back then when we introduced tracing_context_var (now in L283-L292)?

@vblagoje
Copy link
Member Author

@vblagoje Asyncs are fine. But I thought session-id/user-id were working? There was code supporting back then when we introduced tracing_context_var (now in L283-L292)?

I know - I was confused myself as I could not find any code where values in tracing_context_var are set. I went all the way back to December last year in a few git tree snapshots nowhere to find any setting of tracing_context_var values. I'll just take my colleague's comment as ground truth and we can talk about how to add session id and other tags to a trace separately.

@LastRemote
Copy link
Contributor

LastRemote commented Sep 19, 2025

I could not find any code where values in tracing_context_var are set...

You are right that these are not explicitly set anywhere in this repository. Those are variables to be set and managed by pipeline providers / service providers to log multi-round or multi-user applications.

For example, I am currently deploying pipelines as services and I allow the API caller (they should be able to define what is a session/user in their intended use case) to include session-id/user-id/tags in the headers. Then I will populate tracing_context_var from these information, and expect this to be parsed into Langfuse and show up in the traces.

@vblagoje
Copy link
Member Author

I could not find any code where values in tracing_context_var are set...

You are right that these are not explicitly set anywhere in this repository. Those are variables to be set and managed by pipeline providers / service providers to log multi-round or multi-user applications.

For example, I am currently deploying pipelines as services and I allow the API caller (they should be able to define what is a session/user in their intended use case) to define session-id/user-id/tags in the headers. Then I will populate tracing_context_var from these information, and expect this to be parsed into Langfuse and show up in the traces.

Understood, and makes sense. And what you are saying is that they don't show up now? And they are showing up in the current version of langfuse-haystack , not this PR?

@LastRemote
Copy link
Contributor

I could not find any code where values in tracing_context_var are set...

You are right that these are not explicitly set anywhere in this repository. Those are variables to be set and managed by pipeline providers / service providers to log multi-round or multi-user applications.
For example, I am currently deploying pipelines as services and I allow the API caller (they should be able to define what is a session/user in their intended use case) to define session-id/user-id/tags in the headers. Then I will populate tracing_context_var from these information, and expect this to be parsed into Langfuse and show up in the traces.

Understood, and makes sense. And what you are saying is that they don't show up now? And they are showing up in the current version of langfuse-haystack , not this PR?

Yes. They were working correctly until then. I am still able to find them in the span details (full log above), but it seems like they are not recognizable by Langfuse anymore.

@vblagoje
Copy link
Member Author

They work for me @LastRemote - I used the following itinerary agent snippet:

token = None
 try:
        # Get user input interactively
        user_input = questionary.text(
            "What kind of itinerary would you like me to create for you? For example: A 4-day trip in the south of France."
        ).ask()
        
        if not user_input:
            print("No input provided. Exiting...")
            return

        # Create a hash of the user input to use as a session ID
        session_id_hash = hashlib.md5(user_input.encode()).hexdigest()
        token = set_langfuse_context(session_id=session_id_hash, user_id="vblagoje", version="1.0.0")
        response = macro_itinerary_agent.run(
            messages=[
                ChatMessage.from_user(text=user_input)
            ]
        )
        if not use_streaming:
            print(response["messages"][-1].text)
    finally:
        for tool in all_tools:
            if (isinstance(tool, MCPToolset)):
                tool.close()
        reset_langfuse_context(token)
Screenshot 2025-09-19 at 11 25 05

So to me it seems to work. The functions set_langfuse_context and reset_langfuse_context are from here

@LastRemote
Copy link
Contributor

@vblagoje Interesting, let me double check my code. Have you observed similar span object like the one I shared above?

@vblagoje
Copy link
Member Author

No, I didn't - I'm using LangfuseConnector, the code is at https://github.com/vblagoje/itinerary-agent I'm getting the above trace by adding var context to that itinerary agent. That's all.

@LastRemote
Copy link
Contributor

I am so confused. I just realized I am also not seeing the root pipeline span (which might be the reason), although I should not have customized that part.

Feel free to merge; I think this issue is much more likely to be on my end. I will share more updates when I understand the situation better.

@LastRemote
Copy link
Contributor

LastRemote commented Sep 19, 2025

And may I ask where is the root trace/pipeline span being created? Would this be related to async process?

@vblagoje
Copy link
Member Author

vblagoje commented Sep 19, 2025

Yes, the root level is no more as it used to be. To follow OTEL Langfuse changed it so we have additional root level now, see description of this PR

@vblagoje vblagoje merged commit b506695 into main Sep 19, 2025
11 checks passed
@vblagoje vblagoje deleted the langfuse_v3_upgrade branch September 19, 2025 13:03
@vblagoje
Copy link
Member Author

Having confirmed that complex agent pipelines are properly traced, and async pipelines as well (🙏 @LastRemote ), along with a proof that users can still use context var (see above) I moved forward to integrate this PR. Release is https://pypi.org/project/langfuse-haystack/3.0.0/
We can fix this 3.x release branch if we find some issues that popped up in this migration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integration:langfuse type:documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update Langfuse integration for Python SDK v3
4 participants